Declarative vm management for a container orchestrator in a virtualized computing system

ABSTRACT

An example virtualized computing system includes a host cluster having a virtualization layer executing on hardware platforms of hosts, the virtualization layer supporting execution of virtual machines (VMs), the VMs including pod VMs and native VMs, the pod VMs including container engines supporting execution of containers in the pod VMs, the native VMs including applications executing on guest operating systems; and an orchestration control plane integrated with the virtualization layer, the orchestration control plane including a master server having a pod VM controller to manage lifecycles of the pod VMs and a native VM controller to manage lifecycles of the native VMs.

CROSS-REFERENCE

This application is a continuation of U.S. patent application Ser. No.17/153,296, filed Jan. 20, 2021, which is incorporated by referenceherein.

BACKGROUND

Applications today are deployed onto a combination of virtual machines(VMs), containers, application services, and more. For deploying suchapplications, a container orchestrator (CO) known as Kubernetes® hasgained in popularity among application developers. Kubernetes provides aplatform for automating deployment, scaling, and operations ofapplication containers across clusters of hosts. It offers flexibilityin application development and offers several useful tools for scaling.

In a Kubernetes system, containers are grouped into logical unit called“pods” that execute on nodes in a cluster (also referred to as “nodecluster”). Containers in the same pod share the same resources andnetwork and maintain a degree of isolation from containers in otherpods. The pods are distributed across nodes of the cluster. In a typicaldeployment, a node includes an operating system (OS), such as Linux®,and a container engine executing on top of the OS that supports thecontainers of the pod. A node can he a physical server or a VM.

Computer virtualization is a technique that involves encapsulating aphysical computing machine platform into virtual machine(s) executingunder control of virtualization software on a hardware computingplatform or “host.” A virtual machine (VM) provides virtual hardwareabstractions for processor, memory, storage, and the like to a guestoperating system. The virtualization software, also referred to as a“hypervisor,” incudes one or more virtual machine monitors (VMMs) toprovide execution environment(s) for the virtual machine(s). VMs allowfor greater operating system diversity, isolation, and customizationthan do containers. Users have made considerable investments in makingtheir applications run well on VMs and leveraging differentiatingtechnologies of the underlying virtualized computing system. It is thusdesirable to bring VMs into container orchestration systems likeKubernetes to allow a single management and deployment paradigm.

SUMMARY

In an embodiment, a virtualized computing system includes: a hostcluster having a virtualization layer executing on hardware platforms ofhosts, the virtualization layer supporting execution of virtual machines(VMs), the VMs including pod VMs and native VMs, the pod VMs includingcontainer engines supporting execution of containers in the pod VMs, thenative VMs including applications executing on guest operating systems;and an orchestration control plane integrated with the virtualizationlayer, the orchestration control plane including a master server havinga pod VM controller to manage lifecycles of the pod VMs and a native VMcontroller to manage lifecycles of the native VMs.

Further embodiments include a non-transitory computer-readable storagemedium comprising instructions that cause a computer system to carry outthe above methods, as well as a computer system configured to carry outthe above methods.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of a virtualized computing system in whichembodiments described herein may be implemented.

FIG. 2 is a block diagram depicting a software platform according anembodiment.

FIG. 3 is a block diagram of a supervisor Kubernetes master according toan embodiment.

FIG. 4 is a block diagram depicting a logical view of a guest clusterexecuting in a virtualized computing system according to an embodiment.

FIG. 5 is a flow diagram depicting a method of application orchestrationin a virtualized computing system according to an embodiment.

FIG. 6 is a flow diagram depicting a method of application orchestrationin a virtualized computing system according to another embodiment.

FIG. 7 is a flow diagram depicting a method of application orchestrationin a virtualized computing system according to an embodiment.

DETAILED DESCRIPTION

Declarative VM management for a container orchestrator in a virtualizedcomputing system is described. In embodiments described herein, avirtualized computing system includes a software-defined datacenter(SDDC) comprising a server virtualization platform integrated with alogical network platform. The server virtualization platform includesclusters of physical servers (“hosts”) referred to as “host clusters.”Each host cluster includes a virtualization layer, executing on hosthardware platforms of the hosts, which supports execution of virtualmachines (VMs). A virtualization management server manages hostclusters, the virtualization layers, and the VMs executing thereon.

In embodiments, the virtualization layer of a host cluster is integratedwith an orchestration control plane, such as a Kubernetes control plane.This integration enables the host cluster as a “supervisor cluster” thatuses VMs to implement both control plane nodes having a Kubernetescontrol plane, and compute nodes managed by the control plane nodes. Forexample, Kubernetes pods are implemented as “pod VMs,” each of whichincludes a kernel and container engine that supports execution ofcontainers. In embodiments, the Kubernetes control plane of thesupervisor cluster is extended to support custom objects in addition topods, such as VM objects that are implemented using native VMs (asopposed to pod VMs). A virtualization infrastructure administrator (VIadmin) can enable a host cluster as a supervisor cluster and provide itsfunctionality to development teams.

In embodiments, the orchestration control plane includes masterserver(s) with both pod VM controllers and native VM controllers. Thepod VM controllers manage the lifecycles of pod VMs. The native VMcontrollers manage the lifecycles of native VMs executing in parallel tothe pod VMs. These and further advantages and aspects of the disclosedtechniques are described below with respect to the drawings.

FIG. 1 is a block diagram of a virtualized computing system 100 in whichembodiments described herein may be implemented. System 100 includes acluster of hosts 120 (“host cluster 118”) that may be constructed onserver-grade hardware platforms such as an x86 architecture platforms.For purposes of clarity, only one host cluster 118 is shown. However,virtualized computing system 100 can include many of such host clusters118. As shown, a hardware platform 122 of each host 120 includesconventional components of a computing device, such as one or morecentral processing units (CPUs) 160, system memory (e.g., random accessmemory (RAM) 162), one or more network interface controllers (NICs) 164,and optionally local storage 163. CPUs 160 are configured to executeinstructions, for example, executable instructions that perform one ormore operations described herein, which may be stored in RAM 162. NICs164 enable host 120 to communicate with other devices through a physicalnetwork 180. Physical network 180 enables communication between hosts120 and between other components and hosts 120 (other componentsdiscussed further herein). Physical network 180 can include a pluralityof VLANs to provide external network virtualization as described furtherherein.

In the embodiment illustrated in FIG. 1 , hosts 120 access sharedstorage 170 by using NICs 164 to connect to network 180. In anotherembodiment, each host 120 contains a host bus adapter (ITBA) throughwhich input/output operations (IOs) are sent to shared storage 170 overa separate network (e.g., a fibre channel (PC) network). Shared storage170 include one or more storage arrays, such as a storage area network(SAN), network attached storage (NAS), or the like. Shared storage 170may comprise magnetic disks, solid-state disks, flash memory, and thelike as well as combinations thereof. In some embodiments, hosts 120include local storage 163 (e.g., hard disk drives, solid-state drives,etc.). Local storage 163 in each host 120 can be aggregated andprovisioned as part of a virtual SAN, which is another form of sharedstorage 170.

A software platform 124 of each host 120 provides a virtualizationlayer, referred to herein as a hypervisor 150, which directly executeson hardware platform 122. In an embodiment, there is no interveningsoftware, such as a host operating system (OS), between hypervisor 150and hardware platform 122. Thus, hypervisor 150 is a Type-1 hypervisor(also known as a “bare-metal” hypervisor). As a result, thevirtualization layer in host cluster 118 (collectively hypervisors 150)is a bare-metal virtualization layer executing directly on host hardwareplatforms. Hypervisor 150 abstracts processor, memory, storage, andnetwork resources of hardware platform 122 to provide a virtual machineexecution space within which multiple virtual machines (VM) may beconcurrently instantiated and executed. One example of hypervisor 150that may be configured and used in embodiments described herein is aVMware ESXi™ hypervisor provided as part of the VMware vSphere® solutionmade commercially available by VMware, Inc. of Palo Alto, CA.

in the example of FIG. 1 , host cluster 118 is enabled as a “supervisorcluster,” described further herein, and thus VMs executing on each host120 include pod VMs 130 and native VMs 140. A pod VM 130 is a virtualmachine that includes a kernel and container engine that supportsexecution of containers, as well as an agent (referred to as a pod VMagent) that cooperates with a controller of an orchestration controlplane 115 executing in hypervisor 150 (referred to as a pod VMcontroller). An example of pod VM 130 is described further below withrespect to FIG. 2 . VMs 130/140 support applications 141 deployed ontohost cluster 118, which can include containerized applications (e.g.,executing in either pod VMs 130 or native VMs 140) and applicationsexecuting directly on guest operating systems (non-containerized) (e.g.,executing in native VMs 140). One specific application discussed furtherherein is a guest cluster executing as a virtual extension of asupervisor cluster. Some VMs 130/140, shown as support VMs 145, havespecific functions within host cluster 118. For example, support VMs 145can provide control plane functions, edge transport functions, and thelike. An embodiment of software platform 124 is discussed further belowwith respect to FIG. 2 .

Host cluster 118 is configured with a software-defined (SD) networklayer 175. SD network layer 175 includes logical network servicesexecuting on virtualized infrastructure in host cluster 118. Thevirtualized infrastructure that supports the logical network servicesincludes hypervisor-based components, such as resource pools,distributed switches, distributed switch port groups and uplinks, etc.,as well as VM-based components, such as router control VMs, loadbalancer VMs, edge service VMs, etc. Logical network services includelogical switches, logical routers. logical firewalls, logical virtualprivate networks (VPNs), logical load balancers, and the like,implemented on top of the virtualized infrastructure. In embodiments,virtualized computing system 100 includes edge transport nodes 178 thatprovide an interface of host cluster 118 to an external network (e.g., acorporate network, the public Internet, etc.). Edge transport nodes 178can include a gateway between the internal logical networking of hostcluster 118 and the external network. Edge transport nodes 178 can bephysical servers or VMs. For example, edge transport nodes 178 can beimplemented in support VMs 145 and include a gateway of SD network layer175. Various clients 119 can access service(s) in virtualized computingsystem through edge transport nodes 178 (including VM management client106 and Kubernetes client 102, which as logically shown as beingseparate by way of example).

Virtualization management server 11.6 is a physical or virtual serverthat manages host cluster 118 and the virtualization layer therein.Virtualization management server 116 installs agent(s) 152 inhyper-visor 150 to add a host 120 as a managed entity. Virtualizationmanagement server 116 logically groups hosts 120 into host cluster 118to provide cluster-level functions to hosts 120, such as VM migrationbetween hosts 120 (e.g., for load balancing), distributed powermanagement, dynamic VM placement according to affinity and anti-affinityrules, and high-availability. The number of hosts 120 in host cluster118 may he one or many. Virtualization management server 116 can managemore than one host cluster 118.

In an embodiment, virtualization management server 116 further enableshost cluster 118 as a supervisor cluster 101. Virtualization managementserver 116 installs additional agents 152 in hypervisor 150 to add host120 to supervisor cluster 101. Supervisor cluster 101 integrates anorchestration control plane 115 with host duster 118. In embodiments,orchestration control plane 115 includes software components thatsupport a container orchestrator, such as Kubernetes, to deploy andmanage applications on host cluster 118. By way of example, a Kubernetescontainer orchestrator is described herein. In supervisor cluster 101,hosts 120 become nodes of a Kubernetes cluster and pod VMs 130 executingon hosts 120 implement Kubernetes pods. Orchestration control plane 115includes supervisor Kubernetes master 104 and agents 152 executing invirtualization layer (e.g., hypervisors 150). Supervisor Kubernetesmaster 104 includes control plane components of Kubernetes, as well ascustom controllers, custom plugins, scheduler extender, and the likethat extend Kubernetes to interface with virtualization managementserver 116 and the virtualization layer. For purposes of clarity,supervisor Kubernetes master 104 is shown as a separate logical entity.For practical implementations, supervisor Kubernetes master 104 isimplemented as one or more VM(s) 130/140 in host cluster 118. Further,although only one supervisor Kubernetes master 104 is shown, supervisorcluster 101 can include more than one supervisor Kubernetes master 104in a logical cluster for redundancy and load balancing.

In an embodiment, virtualized computing system 100 further includes astorage service 110 that implements a storage provider in virtualizedcomputing system 100 for container orchestrators. In embodiments,storage service 110 manages lifecycles of storage volumes (e.g., virtualdisks) that back persistent volumes used by containerized applicationsexecuting in host cluster 118. A container orchestrator such asKubernetes cooperates with storage service 110 to provide persistentstorage for the deployed applications. In the embodiment of FIG. 1 ,supervisor Kubernetes master 104 cooperates with storage service 110 todeploy and manage persistent storage in the supervisor dusterenvironment. Other embodiments described below include a vanillacontainer orchestrator environment and a guest cluster environment.Storage service 110 can execute in virtualization management server 116as shown or operate independently from virtualization management server116 (e.g., as an independent physical or virtual server).

In an embodiment, virtualized computing system 100 further includes anetwork manager 112. Network manager 112 is a physical or virtual serverthat orchestrates SD network layer 175. In an embodiment, networkmanager 112 comprises one or more virtual servers deployed as VMs.Network manager 112 installs additional agents 152 in hypervisor 150 toadd a host 120 as a managed entity, referred to as a transport node. Inthis manner, host cluster 118 can be a cluster 103 of transport nodes.One example of an SD networking platform that can he configured and usedin embodiments described herein as network manager 112 and SD networklayer 175 is a VMware NSX® platform made commercially available byVMware, Inc. of Palo Alto, CA.

Network manager 112 can deploy one or more transport zones invirtualized computing system 100, including VLAN transport zone(s) andan overlay transport zone. A VLAN transport zone spans a set of hosts120 (e.g., host cluster 118) and is backed by external networkvirtualization of physical network 180 (e.g., a VLAN). One example VLANtransport zone uses a management VLAN 182 on physical network 180 thatenables a management network connecting hosts 120 and the VI controlplane (e.g., virtualization management server 116 and network manager112). An overlay transport zone using overlay VLAN 184 on physicalnetwork 180 enables an overlay network that spans a set of hosts 120(e.g., host cluster 118) and provides internal network virtualizationusing software components (e.g., the virtualization layer and servicesexecuting in VMs). Host-to-host traffic for the overlay transport zoneis carried by physical network 180 on the overlay VLAN 184 usinglayer-2-over-layer-3 tunnels. Network manager 112 can configure SDnetwork layer 175 to provide a cluster network 186 using the overlaynetwork. The overlay transport zone can he extended into at least one ofedge transport nodes 178 to provide ingress/egress between clusternetwork 186 and an external network.

In an embodiment, system 100 further includes an image registry 190. Asdescribed herein, containers of supervisor cluster 101 execute in podVMs 130. The containers in pod VMs 130 are spun up from container imagesmanaged by image registry 190. Image registry 190 manages images andimage repositories for use in supplying images for containerizedapplications.

Virtualization management server 116 and network manager 112 comprise avirtual infrastructure (VI) control plane 113 of virtualized computingsystem 100. Virtualization management server 116 can include asupervisor cluster service 109, storage service 110, and VI services108. Supervisor cluster service 109 enables host cluster 118 assupervisor cluster 101 and deploys the components of orchestrationcontrol plane 115. VI services 108 include various virtualizationmanagement services, such as a distributed resource scheduler (DRS),high-availability (HA) service, single sign-on (SSO) service,virtualization management daemon, and the like. DRS is configured toaggregate the resources of host cluster 118 to provide resource poolsand enforce resource allocation policies. DRS also provides resourcemanagement in the form of load balancing, power management, VMplacement, and the like. HA service is configured to pool VMs and hostsinto a monitored cluster and, in the event of a failure, restart VMs onalternate hosts in the cluster. A single host is elected as a master,which communicates with the HA service and monitors the state ofprotected VMs on subordinate hosts. The HA service uses admissioncontrol to ensure enough resources are reserved in the cluster for VMrecovery when a host fails. SSO service comprises security tokenservice, administration server, directory service, identity managementservice, and the like configured to implement an SSO platform forauthenticating users. The virtualization management daemon is configuredto manage objects, such as data centers, clusters, hosts, VMs, resourcepools, datastores, and the like.

A VI admin can interact with virtualization management server 116through a VM management client 106. Through VM management client 106, aVI admin commands virtualization management server 116 to form hostcluster 118, configure resource pools, resource allocation policies, andother cluster-level functions, configure storage and networking, enablesupervisor cluster 101, deploy and manage image registry 190, and thelike.

Kubernetes client 102 represents an input interface for a user tosupervisor Kubernetes master 104. Kubernetes client 102 is commonlyreferred to as kubectl. Through Kubernetes client 102, a user submitsdesired states of the Kubernetes system, e.g., as YAML documents, tosupervisor Kubernetes master 104. In embodiments, the user submits thedesired states within the scope of a supervisor namespace. A “supervisornamespace” is a shared abstraction between VI control plane 113 andorchestration control plane 115. Each supervisor namespace providesresource-constrained and authorization-constrained units ofmulti-tenancy. A supervisor namespace provides resource constraints,user-access constraints, and policies (e.g., storage policies, networkpolicies, etc.). Resource constraints can be expressed as quotas,limits, and the like with respect to compute (CPU and memory), storage,and networking of the virtualized infrastructure (host cluster 118,shared storage 170, SD network layer 175). User-access constraintsinclude definitions of users, roles, permissions, bindings of roles tousers, and the like. Each supervisor namespace is expressed withinorchestration control plane 115 using a namespace native toorchestration control plane 115 (e.g., a Kubernetes namespace orgenerally a “native namespace”), which allows users to deployapplications in supervisor cluster 101 within the scope of supervisornamespaces. In this manner, the user interacts with supervisorKubernetes master 104 to deploy applications in supervisor cluster 101within defined supervisor namespaces.

While FIG. 1 shows an example of a supervisor cluster 101, thetechniques described herein do not require a supervisor cluster 101. Insome embodiments, host cluster 118 is not enabled as a supervisorcluster 101. In such case, supervisor Kubernetes master 104, Kubernetesclient 102, pod VMs 130, supervisor cluster service 109, and imageregistry 190 can be omitted. While host cluster 118 is show as beingenabled as a transport node cluster 103, in other embodiments networkmanager 112 can he omitted. In such case, virtualization managementserver 116 functions to configure SD network layer 175.

FIG. 2 is a block diagram depicting software platform 124 according anembodiment. As described above, software platform 124 of host 120includes hypervisor 150 that supports execution of VMs, such as pod VMs130, native VMs 140, and support VMs 145. In an embodiment, hypervisor150 includes a VM management daemon 211, a host daemon 214, a pod VMcontroller 216, a native VM controller 217, an image service 218, andnetwork agents 222. VM management daemon 211 is an agent 152 installedby virtualization management server 116. VM management daemon 211provides an interface to host daemon 214 for virtualization managementserver 116. Host daemon 214 is configured to create, configure, andremove VMs (e.g., pod VMs 130 and native VMs 140).

Pod VM controller 216 is an agent 152 of orchestration control plane 115for supervisor cluster 101 and allows supervisor Kubernetes master 104to interact with hypervisor 150. Pod VM controller 216 configures therespective host as a node in supervisor cluster 101. Pod VM controller216 manages the lifecycle of pod VMs 130, such as determining when tospin-up or delete a pod VM. Pod VM controller 216 also ensures that anypod dependencies, such as container images, networks, and volumes areavailable and correctly configured. Pod VM controller 216 is omitted ifhost cluster 118 is not enabled as a supervisor cluster 101. Native VMcontroller is an agent 152 of orchestration control plane 115 forsupervisor cluster 101 and allows supervisor Kubernetes master 104 tointeract with hypervisor 150 to manage lifecycles of native VMs 140 andapplications executing therein. While shown separately from pod VMcontroller 216, in some embodiments both pod VM controller 216 andnative VM controller 217 can be functions of a single controller.

Image service 218 is configured to pull container images from imageregistry 190 and store them in shared storage 170 such that thecontainer images can be mounted by pod VMs 130. Image service 218 isalso responsible for managing the storage available for container imageswithin shared storage 170. This includes managing authentication withimage registry 190, assuring providence of container images by verifyingsignatures, updating container images when necessary, and garbagecollecting unused container images. Image service 218 communicates withpod VM controller 216 during spin-up and configuration of pod VMs 130.In some embodiments, image service 218 is part of pod VM controller 216.In embodiments, image service 218 utilizes system VMs 130/140 in supportVMs 145 to fetch images, convert images to container image virtualdisks, and cache container image virtual disks in shared storage 170.

Network agents 222 comprises agents 152 installed by network manager112. Network agents 222 are configured to cooperate with network manager112 to implement logical network services. Network agents 222 configurethe respective host as a transport node in a duster 103 of transportnodes.

Each pod VM 130 has one or more containers 206 running therein in anexecution space managed by container engine 208. The lifecycle ofcontainers 206 is managed by pod VM agent 212. Both container engine 208and pod VM agent 212 execute on top of a kernel 210 (e.g., a Linux®kernel). Each native VM 140 has applications 202 running therein on topof an OS 204. Native VMs 140 do not include pod VM agents and areisolated from pod VM controller 216. Rather, native VMs 140 includemanagement agents 213 that communicate with native VM controller 217.Container engine 208 can be an industry-standard container engine, suchas libcontainer, rune, or containerd. Pod VMs 130, pod VM controller216, native VM controller 217, and image service 218 are omitted if hostcluster 118 is not enabled as a supervisor cluster 101.

FIG. 3 is a block diagram of supervisor Kubernetes master 104 accordingto an embodiment. Supervisor Kubernetes master 104 includes applicationprogramming interface (API) server 302, a state database 303, ascheduler 304, a scheduler extender 306, controllers 308, and plugins319. API server 302 includes the Kubernetes API server, kube-api-server(“Kubernetes API 326”) and custom APIs 305. Custom APIs 305 are APIextensions of Kubernetes API 326 using either the customresource/operator extension pattern or the API extension server pattern.Custom APIs 305 are used to create and manage custom resources, such asVM objects. API server 302 provides a declarative schema for creating,updating, deleting, and viewing objects.

State database 303 stores the state of supervisor cluster 101 (e.g.,etcd) as objects created by API server 302. A user can provideapplication specification data to API server 302 that defines variousobjects supported by the API (e.g., as a YAML document). The objectshave specifications that represent the desired state. State database 303stores the objects defined by application specification data as part ofthe supervisor cluster state. Standard Kubernetes objects (“Kubernetesobjects 310”) include namespaces, nodes, pods, config maps, secrets,among others. Custom objects are resources defined through custom APIs305 (e.g., VM objects 307).

Namespaces provide scope for objects. Namespaces are objects themselvesmaintained in state database 303. A namespace can include resourcequotas, limit ranges, role bindings, and the like that are applied toobjects declared within its scope. VI control plane 113 creates andmanages supervisor namespaces for supervisor cluster 101. A supervisornamespace is a resource-constrained and authorization-constrained unitof multi-tenancy managed by virtualization management server 116.Namespaces inherit constraints from corresponding supervisor clusternamespaces. Config maps include configuration information forapplications managed by supervisor Kubernetes master 104. Secretsinclude sensitive information for use by applications managed bysupervisor Kubernetes master 104 (e.g., passwords, keys, tokens, etc.).The configuration information and the secret information stored byconfig maps and secrets is generally referred to herein as decoupledinformation. Decoupled information is information needed by the managedapplications, but which is decoupled from the application code.

Controllers 308 can include, for example, standard Kubernetescontrollers (“Kubernetes controllers 316”) (e.g.,kube-controller-manager controllers, cloud-controller-managercontrollers, etc.) and custom controllers 318. Custom controllers 318include controllers for managing lifecycle of Kubernetes objects 310 andcustom objects. For example, custom controllers 318 can include a VMcontrollers 328 configured to manage VM objects 307 and a pod VMlifecycle controller (PLC) 330 configured to manage pods 324. Acontroller 308 tracks objects in state database 303 of at least oneresource type. Controller(s) 318 are responsible for making the currentstate of supervisor cluster 101 come closer to the desired state asstored in state database 303. A controller 318 can carry out action(s)by itself, send messages to API server 302 to have side effects, and/orinteract with external systems.

Plugins 319 can include, for example, network plugin 312 and storageplugin 314. Plugins 319 provide a well-defined interface to replace aset of functionality of the Kubernetes control plane. Network plugin 312is responsible for configuration of SD network layer 175 to deploy andconfigure the cluster network. Network plugin 312 cooperates withvirtualization management server 116 and/or network manager 112 todeploy logical network services of the cluster network. Network plugin312 also monitors state database for custom objects 307, such as NIPobjects. Storage plugin 314 is responsible for providing a standardizedinterface for persistent storage lifecycle and management to satisfy theneeds of resources requiring persistent storage. Storage plugin 314cooperates with virtualization management server 116 and/or persistentstorage manager 110 to implement the appropriate persistent storagevolumes in shared storage 170.

Scheduler 304 watches state database 303 for newly created pods with noassigned node. A pod is an object supported by API server 302 that is agroup of one or more containers, with network and storage, and aspecification on how to execute. Scheduler 304 selects candidate nodesin supervisor cluster 101 for pods. Scheduler 304 cooperates withscheduler extender 306, which interfaces with virtualization managementserver 116. Scheduler extender 306 cooperates with virtualizationmanagement server 116 (e.g., such as with DRS) to select nodes fromcandidate sets of nodes and provide identities of hosts 120corresponding to the selected nodes. For each pod, scheduler 304 alsoconverts the pod specification to a pod VM specification, and schedulerextender 306 asks virtualization management server 116 to reserve a podVM on the selected host 120. Scheduler 304 updates pods in statedatabase 303 with host identifiers.

Kubernetes API 326, state database 303, scheduler 304, and Kubernetescontrollers 316 comprise standard components of a Kubernetes systemexecuting on supervisor cluster 101. Custom controllers 318, plugins319, and scheduler extender 306 comprise custom components oforchestration control plane 115 that integrate the Kubernetes systemwith host cluster 118 and VI control plane 113.

In embodiments, custom APIs 305 enable developers to discover availablecontent and to import existing VMs as new images within their KubernetesNamespace. In embodiments, VM objects 307 that can be specified throughcustom APIs 305 include VM resources 320, VM image resources 322, VMprofile resources 324, network policy resources 325, network resources327, and service resources 329.

VM image resource 322 enables discovery of available images forconsumption via custom APIs 305. VM image resource 322 resource exposesverbs such as image listing, filtering and import so that the developercan manage the lifecycle and consumption of images. A single VM imageresource 322 describes a reference to an existing VM template image in arepository.

VM profile resource 324 is a resource that describes a curated set of VMattributes that can be used to instantiate native VMs. VM profileresource 324 gives the VI Admin control over the configuration andpolicy of the native VMs that are available to the developer. The VIAdmin can define a set of available VM profile resources 324 availablein each namespace. The VI Admin can create new profiles to balance therequirements of the VI Admin, the developer and those imposed by theunderlying hardware, VM profile resource 324 enables definition ofclasses of information such as virtual CPU and memory capacity exposedto the native VM, resource, availability and compute policy for thenative VM, and special hardware resources (e.g. FPGA, pmem, vGPU, etc.)available to the VM profile.

Unlike pods VMs, native VMs have their own network requirements.Multipath NICs, public/private NICs and legacy application requirementscan drive the need for support of multiple vNICs, each of which may havecustom network configuration. As an example, a clustered SQL Serverexpects to have at least two vNiCs on separate networks: one public andone private. The private vNIC is used for IPC and heartbeat traffic withits peers. As a consequence of this need for flexibility, network policyresource 325 allows the VI admin to define the set of available networksfor native VMs.

Network resource 327 represents a single network to be consumed by anative VM. In embodiments, network resource 327 is a simple resource,abstracting the details of an underlying virtual port group that thenetwork represents. For example, network resource 327 may be one of thefollowing types: standard port group, distributed port group, or tier Ilogical router in SD network layer 175, and the like. The availablenetworks are configured by the VI Admin for each namespace via a networkpolicy resource 325. Network resources 327 are used to attach additionalnetwork interfaces to a specific virtual network.

Service resource 329 binds native VM instances to Kubernetes services inorder to expose a network service from the native VM to pods and othernative VMs. In embodiments, service resource 329 includes a lab&selector that is used to match any labels applied to any VM resources320. Once a service resource 329 and a VM resource 320 have beencoupled, a delegate service and endpoints resource is installed in orderto enable network access to the native VM via the service DNS name or IPaddress.

A VM resource 320 resource combines all of the above resources togenerate a desired native VM. In embodiments, a VM resource 320specifies a VM image resource 322 to use as the master image.Optionally, a developer can override additional attributes of the clonedimage. In embodiments, a developer can override image attributes byspecifying a VM profile resource 324. In other embodiments, a developercan override image attributes by explicit specification of a desiredattribute to override. VM resources 320 specify a configuration that ismapped to underlying infrastructure features VM controllers 328,including but not limited to: VM Name, Virtual Resource Capacity,Network to Virtual NIC binding, DNS Configuration, Volume Customization,VM Customization scripts and VM Placement and Affinity policy.

FIG. 4 is a block diagram depicting a logical view of a virtualizedcomputing system according to an embodiment. Supervisor cluster 101 isimplemented by a software-defined data center (SDDC) 402. SDDC 402includes virtualized computing system 100 shown in FIG. 1 , includinghost cluster 118, virtualization management server 116, network manager112, shared storage 170, and SD network layer 175. SDDC 402 includes VIcontrol plane 113 for managing a virtualization layer of host cluster118, along with shared storage 170 and SD network layer 175. A VI admininteracts with VM management server 116 (and optionally network manager112) of VI control plane 113 to configure SDDC 402 to implementsupervisor cluster 101.

Supervisor cluster 101 includes orchestration control plane 115, whichincludes supervisor Kubernetes master(s) 104. The VI admin interactswith VM management server 116 to create supervisor namespaces includingsupervisor namespace 412. Each supervisor namespace includes a resourcepool and authorization constraints. The resource pool includes variousresource constraints on the supervisor namespace (e.g., reservation,limits, and share (RLS) constraints). Authorization constraints providefor which roles are permitted to perform which operations in thesupervisor namespace (e.g., allowing VI admin to create, manage access,allocate resources, view, and create objects; allowing DevOps to viewand create objects; etc.). A user interacts with supervisor Kubernetesmaster 104 to deploy applications on supervisor duster 101 within scopesof supervisor namespaces. In the example, the user deploys containerizedapplications 428 on pod VMs 130 and non-containerized applications 429on native VMs 140. Non-containerized applications 429 execute on a guestoperating system in a native VM 140 exclusive of any container engine.

Kubernetes allows passing of configuration and secret information tocontainerized applications 428. However, standard Kubernetes does notextend this functionality beyond pod-based workloads (i.e.,containerized applications executing in pods). Embodiments describedherein extend this functionality for applications executing in nativeVMs (e.g., non-containerized applications 429). In embodiments,supervisor Kubernetes master 104 manages lifecycle of decoupledinformation 403 (e.g., config maps and secrets) for non-containerizedapplications 429. That is, supervisor Kubernetes master 104 performscreate, read, update, and delete operations on objects that includedecoupled information 403. Supervisor Kubernetes master 104 providesdecoupled information 403 to native VM controller 217 upon deployment ofnon-containerized applications 429 to native VMs 140. Native VMcontroller 217 cooperates with management agent 213 executing in eachnative VM 140 to provide decoupled information 403 for use bynon-containerized applications 429. Management agent 213 in each nativeVM 140 exposes decoupled information 403 for access by non-containerizedapplications 429. In embodiments, management agent 213 createsenvironment variables accessible by non-containerized applications 429.In embodiments, management agent 213 creates files in a files in afilesystem accessible by native VMs 140, which in turn can be read bynon-containerized applications 429. In some embodiments, the files canbe resident in system memory (e.g., RAM). Supervisor Kubernetes master104 can provide updates to decoupled information 403 to native VMcontroller 217, which in turn provides the updates to management agent213 for use by non-containerized applications 429.

When specifying a non-containerized application at supervisor Kubernetesmaster 104, the user can specify which decoupled information 403 uponwhich the application relies and how to consume the decoupledinformation (e.g., as environment variables, as files, etc.). SupervisorKubernetes master 104 schedules the non-containerized application to runin a VM object implemented by a native VM 140. Upon deployment of nativeVM 140, management agent 213 establishes a connect with native VMcontroller 217 using a hypervisor-guest channel (e.g., a virtual socketconnection). In embodiments, management agent 213 communicates withnative VM controller 217 over the hypervisor-guest channel using aremote procedure call (RPC) protocol. Management agent 213 sets updecoupled information 403 as specified for each non-containerizedapplication 429 (e.g., environment variables, files, etc.). Managementagent 213 updates decoupled information 403 exposed to non-containerizedapplications 429 as updates are received from supervisor Kubernetesmaster 104 through native VM controller 217.

FIG. 5 is a flow diagram depicting a method 500 of applicationorchestration in a virtualized computing system according to anembodiment. Method 500 can be performed by software in supervisorcluster 101 executing on CPU, memory, storage, and network resourcesmanaged by virtualization layer(s) (e.g., hypervisor(s)) or a hostoperating system(s). Method 500 can be understood with reference toFIGS. 3-4 .

Method 500 begins at step 502, where supervisor Kubernetes master 104receives a specification for an application to be deployed using anative VM. The specification is defined using custom APIs 305. Forexample, at step 504, a user can specify a VM image resource 322. Atstep 506, a user can specify a VM profile 324. At step 508, a user canspecify a network policy 325 and/or network resources 327. At step 510,a user can optionally specify a VM service 329. A user can tie all ofthese objects together by specifying a VM resource 320.

At step 512, VM controller 328, which is part of orchestration controlplane 115, cooperates with virtualization management server 116, whichis part of VI control plane 113, to select a host 120 for deploying anative VM. Thus, the user specifies the native VM to orchestrationcontrol plane 115, which in turn cooperates with VI control plane 113 todeploy the native VM. At step 514, VM controller 328 cooperates withvirtualization management server 116 to deploy a native VM 140 asspecified to the selected host. Native VM 140 is deployed alongside anypod VMs 130 executing in the selected host and managed by orchestrationcontrol plane 115. Thus, orchestration control plane controls deploymentof both native VM 140 and pod VM(s) 130. For example, at step 516, VMcontroller 328 and virtualization management server 116 clone a VM froma selected VM image after resource creation on the selected host. Atstep 518, VM controller 328 and virtualization management server 116apply policies (e.g., VM profile(s), network policy, etc.) to the nativeVM. At step 520, VM controller 328 and virtualization management server116 start native VM 140 on the selected host as configured.

At step 522, management agent 213 receives config map/secrets fromsupervisor Kubernetes master 104 through native VM controller 217.Management agent 213 exposes the configuration/secret information in theconfig maps/secrets to the application as specified by the user. At step524, VM controller 328 and virtualization management server 116cooperate to power down and delete the native VM upon deletion of VMresource 320.

FIG. 6 is a flow diagram depicting a method 600 of applicationorchestration in a virtual zed computing system according to anotherembodiment. As shown in FIG. 6 , method 600 can be performed by VIcontrol plane 113 and orchestration control plane 115, which comprisesoftware executing on CPU, memory, storage, and network resourcesmanaged by a virtualization layer (e.g., a hypervisor) and/or hostoperating system. Method 600 begins at step 602, where a user provides apod specification to API server 302 to create a new pod. At step 604,scheduler 304 selects candidate nodes for deployment of the pod.Scheduler 304 selects the candidate nodes by filtering on affinity, nodeselector constraints, etc. At step 606, scheduler extender 306cooperates with VI services 108 in VM management server 116 to select anode from the set of candidate nodes. VI services 108 selects zero orone node from the list of a plurality of candidate nodes provided byscheduler extender 306.

At step 608, scheduler 304 converts the pod specification to a VMspecification for a pod VM 130, For example, scheduler 304 converts CPUand memory requests and limits from pod specification to VMspecification with fallback to reasonable defaults. The VM specificationincludes a vNIC device attached to the logical network used by pod VMs130. The guest OS in VM specification is specified to be kernel 210 withcontainer engine 208. Storage is an ephemeral virtual disk.

At step 610, PLC 324 invokes VM management server 116 to deploy pod VM130 to a host 120 corresponding to the selected node. At step 612, VMmanagement server 116 cooperates with host daemon 214 in host 120corresponding to the selected node to create and power-on pod VM 130.

FIG. 7 is a flow diagram depicting a method 700 of applicationorchestration in a virtualized computing system according to anotherembodiment. Method 700 can be performed by VI control plane 113 andorchestration control plane 115, which comprise software executing onCPU, memory, storage, and network resources managed by a virtualizationlayer (e.g., a hypervisor) and/or host operating system. Method 700begins at step 702, where a user provides specification(s) to API server302 in orchestration control plane 115 to create new pod VM(s) 130 andnew native VM(s) 140. At step 704, orchestration control plane 115executes deployment of each native VM 140 as described above withrespect to FIG. 5 . At step 706, orchestration control plane 115executes deployment of each pod VM 130 as described above with respectto FIG. 6 . In this manner, one or more hosts 120 in host cluster 118execute native VMs 140 alongside pod VMs 130, all of which are deployedand managed by orchestration control plane 115 in cooperation with VIcontrol plane 113.

One or more embodiments of the invention also relate to a device or anapparatus for performing these operations. The apparatus may bespecially constructed for required purposes, or the apparatus may be ageneral-purpose computer selectively activated or configured by acomputer program stored in the computer. Various general-purposemachines may be used with computer programs written in accordance withthe teachings herein, or it may be more convenient to construct a morespecialized apparatus to perform the required operations.

The embodiments described herein may be practiced with other computersystem configurations including hand-held devices, microprocessorsystems, microprocessor-based or programmable consumer electronics,minicomputers, mainframe computers, etc.

One or more embodiments of the present invention may be implemented asone or more computer programs or as one or more computer program modulesembodied in computer readable media. The term computer readable mediumrefers to any data storage device that can store data which canthereafter be input to a computer system. Computer readable media may bebased on any existing or subsequently developed technology that embodiescomputer programs in a manner that enables a computer to read theprograms. Examples of computer readable media are hard drives, NASsystems, read-only memory (ROM), RAM, compact disks (CDs), digitalversatile disks (DVDs), magnetic tapes, and other optical andnon-optical data storage devices. A computer readable medium can also bedistributed over a network-coupled computer system so that the computerreadable code is stored and executed in a distributed fashion.

Although one or more embodiments of the present invention have beendescribed in some detail for clarity of understanding, certain changesmay be made within the scope of the claims. Accordingly, the describedembodiments are to be considered as illustrative and not restrictive,and the scope of the claims is not to be limited to details given hereinbut may be modified within the scope and equivalents of the claims. Inthe claims, elements and/or steps do not imply any particular order ofoperation unless explicitly stated in the claims.

Virtualization systems in accordance with the various embodiments may beimplemented as hosted embodiments, non-hosted embodiments, or asembodiments that blur distinctions between the two. Furthermore, variousvirtualization operations may be wholly or partially implemented inhardware. For example, a hardware implementation may employ a look-uptable for modification of storage access requests to secure non-diskdata.

Many variations, additions, and improvements are possible, regardless ofthe degree of virtualization. The virtualization software can thereforeinclude components of a host, console, or guest OS that performvirtualization functions.

Plural instances may be provided for components, operations, orstructures described herein as a single instance. Boundaries betweencomponents, operations, and data stores are somewhat arbitrary, andparticular operations are illustrated in the context of specificillustrative configurations. Other allocations of functionality areenvisioned and may fall within the scope of the invention. In general,structures and functionalities presented as separate components inexemplary configurations may be implemented as a combined structure orcomponent. Similarly, structures and functionalities presented as asingle component may be implemented as separate components. These andother variations, additions, and improvements may fall within the scopeof the appended claims.

What is claimed is:
 1. A virtualized computing system, comprising: ahost cluster having a virtualization layer executing on hardwareplatforms of hosts, the virtualization layer supporting execution ofvirtual machines (VMs), the VMs including first VMs and second VMs, thefirst VMs including container engines supporting execution of containersin the first VMs, the second VMs including non-virtualized guestoperating systems; a virtualization management server configured tomanage the virtualization layer and the host cluster; and anorchestration control plane integrated with the virtualization layer,the orchestration control plane including: a lifecycle controllerconfigured to cooperate with the virtualization layer to managelifecycles of the first VMs, and a VM controller configured to cooperatewith the virtualization management server to manage lifecycles of thesecond VMs.
 2. The virtualized computing system of claim 1, wherein thesecond VMs execute non-containerized applications.
 3. The virtualizedcomputing system of claim 1, wherein the orchestration control planeincludes custom APIs to manage objects monitored by the VM controller.4. The virtualized computing system of claim 3, wherein the objectsinclude VM objects for the second VMs and VM image objects for VM imagesof guest software executing in the second VMs, the guest softwareincluding the non-virtualized guest operating system.
 5. The virtualizedcomputing system of claim 3, wherein the objects include VM serviceobjects for exposing network services of the second VMs.
 6. Thevirtualized computing system of claim 3, wherein the objects includevirtual network resource objects for representing networks consumed bythe second VMs.
 7. The virtualized computing system of claim 1, whereinthe VM controller is configured to communicate with a controller in thevirtualization layer to provide decoupled information to the second VMs.8. A method of application orchestration in a virtualized computingsystem including a host cluster having a virtualization layer directlyexecuting on hardware platforms of hosts and a virtualization managementserver configured to manage the virtualization layer and the hosts, thevirtualization layer supporting execution of virtual machines (VMs), thevirtualization layer integrated with an orchestration control plane, themethod comprising: receiving, at the orchestration control plane,specification data for a first application and a second application;deploying, by a lifecycle controller executing in the orchestrationcontrol plane, the first application to a first VM in a host of the hostcluster based on the specification data, the first VM including acontainer engine supporting execution of containers in the pod VM; anddeploying, by a VM controller executing in the orchestration controlplane and in cooperation with the virtualization management server, thesecond application to a second VM in the host, the second VM executingon the virtualization layer in parallel with the first VM.
 9. The methodof claim 8, wherein the specification data specifies a VM resourcereferencing a VM image resource for a VM image of guest softwareexecuting in the second VM.
 10. The method of claim 8, wherein thespecification data specifies a VM resource referencing a VM profileresource having attributes of the second VM.
 11. The method of claim 8,wherein the specification data specifies a VM resource referencing anetwork resource for a virtual network connected to the second VM. 12.The method of claim 8, wherein the step of deploying comprises: cloningthe second VM from a VM image referenced in the specification data;applying policies to the second VM based on the specification data; andstarting the second VM on a selected host of the host cluster.
 13. Themethod of claim 8, further comprising: receiving decoupled informationat a management agent in the virtualization layer from the orchestrationcontrol plane through the VM controller; and providing the decoupledinformation for consumption by the second application executing in thesecond VM, the decoupled information including at least one ofconfiguration information and secret information.
 14. The method ofclaim 8, wherein the second_application in the second VM isnon-containerized.
 15. A non-transitory computer readable mediumcomprising instructions to be executed in a computing device to causethe computing device to carry out a method of application orchestrationin a virtualized computing system including a host cluster having avirtualization layer directly executing on hardware platforms of hostsand a virtualization management server configured to manage thevirtualization layer and the hosts, the virtualization layer supportingexecution of virtual machines (VMs), the virtualization layer integratedwith an orchestration control plane, the method comprising: receiving,at the orchestration control plane, specification data for a firstapplication and a second application; deploying, by a lifecyclecontroller (PLC), the first application to a first VM in a host of thehost cluster based on the specification data, the first VM including acontainer engine supporting execution of containers in the first VM; anddeploying, by a VM controller executing in the orchestration controlplane and in cooperation with the virtualization management server, thesecond application to a second VM in the host, the second VM executingon the virtualization layer in parallel with the first VM.
 16. Thenon-transitory computer readable medium of claim 15, wherein thespecification data specifies a VM resource referencing a VM imageresource for a VM image of guest software executing in the second VM.17. The non-transitory computer readable medium of claim 15, wherein thespecification data specifies a VM resource referencing a VM profileresource having attributes of the second VM.
 18. The non-transitorycomputer readable medium of claim 15, wherein the specification dataspecifies a VM resource referencing a network resource for a virtualnetwork connected to the second VM.
 19. The non-transitory computerreadable medium of claim 15, wherein the step of deploying comprises:cloning the second VM from a VM image referenced in the specificationdata; applying policies to the second VM based on the specificationdata; and starting the second VM on a selected host of the host cluster.20. The non-transitory computer readable medium of claim 15, wherein thesecond application in the second VM is non-containerized.